--- permalink: /textanalysis/ keywords: fastai description: "Awesome summary" title: Text analysis toc: false branch: master badges: true comments: true categories: [text analysis, sentiment analysis, wordclouds] image: images/some_folder/your_image.png hide: false search_exclude: false metadata_key1: metadata_value1 metadata_key2: metadata_value2 nb_path: _notebooks\05_Text_Analysis.ipynb layout: notebook ---

This section regarding text analysis is divided into two parts: namely wordclouds and sentiment analysis. Both the extracted wiki pages and the character dialogues will be used and it will be investigated how wordclouds and sentiment analysis will differ based on the two different data sets.

Wordclouds

First, we will take a look at word clouds. As mentioned before, both the extracted wiki pages and the full series dialogue will be investigated. We will start by generating wordclouds for characters of interest. Here, we have selected the characters: Jon Snow, Arya Stark, Bronn, Brienne of Tarth and Jaime Lannister. The first step in generating the wordclouds is to compute the term frequeny-inverse document frequency (TF-IDF) for our respective text corpus, i.e. the wiki pages and episode dialogues. For further explanation of the TF-IDF and it's computation we refer to the Explainer Notebook.

Now, let's take a look at the generated wordclouds for the selected characters.

Wordclouds based on character wiki page & dialgue

{% raw %}
{% endraw %}

When comparing the generated wordclouds for the respective data sets it should be noted, that the same words are, for the most part, not present for the respective characters. This is expected as one would imagine that the text from the characters wikipedia pages are more descriptive of the character and their place in the story whereas the wordcloud from the dialogue is exactly that; their most descrriptive words according to TF-IDC used throughout the series. This would be interesting to compare with sentiment analysis which is the second part of this page.

{% raw %}
{% endraw %}

Wordclouds based on selected houses

Next, we will generate wordclouds based on the characters allegiance. This will be done by pooling the wiki text of characters belonging to the same allegiance together and, again, compute the respective TF-IDF score in order to generate the wordclouds. For this, we have selected the houses: Stark, Lannister, Targaryen, Greyjoy and the independant group The Night's Watch. It would be interesting to see, if the houses mottos would appear in these word clouds. The respective house mottos are:

House Stark: Winter is coming
House Lannister: Hear Me Roar!
House Targaryen: Fire and Blood
House Greyjoy: We Do Not Sow

As the Night's Watch is not a House but rather a brotherhood sworn to protect The Wall, they do not have a motto.

{% raw %}
{% endraw %}

When looking at the wordclouds above and the respective house mottos, only the Starks' Winter (small, bottom right) and the Lannisters' Hear (big, middle) are present. All the wordclouds are, however, very descriptive of the respective houses. For instance for the Night's Watch, a military order sworn to protect The Wall, words like protect, wildling and swear are present. The same can be said for House Targaryan, where the main Targaryan character, Daenerys, is married to a dothraki warlord and later in the show, is a leader of dothraki people herself.

Wordclouds based on seasons

{% raw %}
{% endraw %}

Sentiment of characters </h3>

In this second part of text analysis, we will do a sentiment analysis of ther characters, again, based on both their wiki-pages and their dialogue in the series.As we saw in the wordclouds of the selected characters, there was quite a difference in the wordclouds based on the respective wiki-pages and character dialogue. It would be interesting to look at, if this also result in a different sentiment level of the character. Additionally, we will also do a sentiment analysis of the different seasons of the series. Perhaps it can be determined if any of the seasons were significantly different on a sentiment based level.

For the sentiment analysis, we will apply both the dictionary based method of LabMT and the rule- and dictionary-based method of VADER. For further explanation of how these sentiment scores are computed and the difference between the two methods, we again refer to the Explainer Notebook. It should be noted that the score of the two methods differ, as the LabMT score sentiment on a scale from [1:9], while VADER scores on the range [-1:1]. For LabMT, a score of 5 is considered neutral while a score within the range [-0.05:0.05] is considered neutral for VADER.

Sentiment analysis of character dialogue

{% raw %}
{% endraw %}

Sentiment analysis on character wiki pages

{% raw %}
sadest_VADER.reverse(),sadest_LabMT.reverse()
plot_dict_vader = {key: char_sentiment_VADER[key] for key in happiest_VADER+sadest_VADER}
plot_dict_LabMT = {key: char_sentiment_LabMT[key] for key in happiest_LabMT+sadest_LabMT}

fig = plot_VADER_LabMT_scores(plot_dict_vader, plot_dict_LabMT)

fig.show()
{% endraw %}

Sentiment analysis on the series' seasons

{% raw %}
char_season_wiki = {}
base_path = "/work/got2/s"
for s in range(1,9):
    txt = []
    files = os.listdir(base_path + str(s)+"_cleaned/")
    for file in files:
        with open('/work/got2/s'+str(s)+"_cleaned/"+ file, "r") as text_file:
                tmp = text_file.readlines()
        txt.extend(tmp)
    char_season_wiki["s"+str(s)] = txt
{% endraw %} {% raw %}
tokens_LabMT = {char : [lemmatizer.lemmatize(word) for word in word_tokenize(" ".join(text).lower())] for char, text in char_season_wiki.items()}
tokens_VADER = char_season_wiki

#Suprress warnings:
warnings.filterwarnings("ignore")
#Compute sentiment for each character:
char_sentiment_LabMT = {char :sentiment_LabMT(tokens_values) for char, tokens_values in tokens_LabMT.items() }
char_sentiment_VADER = {char :sentiment_VADER(tokens_values) for char, tokens_values in tokens_VADER.items() }

#Sort and find the top 10 happiest and sadest:
happiest_VADER = sorted(char_sentiment_VADER, key = lambda i: char_sentiment_VADER[i],reverse = True)[:3]
happiest_LabMT = sorted(char_sentiment_LabMT, key = lambda i: char_sentiment_LabMT[i],reverse = True)[:3]

sadest_VADER = sorted(char_sentiment_VADER, key = lambda i: char_sentiment_VADER[i],reverse = False)[:3]
sadest_LabMT = sorted(char_sentiment_LabMT, key = lambda i: char_sentiment_LabMT[i],reverse = False)[:3]

print('Happiest based on VADER: ',happiest_VADER)
print('Happiest based on LabMT: ',happiest_LabMT)
print('Sadest based on VADER: ',sadest_VADER)
print('Sadest based on LabMT: ',sadest_LabMT)
Happiest based on VADER:  ['s3', 's2', 's4']
Happiest based on LabMT:  ['s1', 's3', 's6']
Sadest based on VADER:  ['s8', 's7', 's5']
Sadest based on LabMT:  ['s8', 's7', 's4']
{% endraw %}

Dispersion plot </h3>

{% raw %}
{% endraw %} {% raw %}
{% endraw %} {% raw %}
{% endraw %}

{% include image.html alt="Created in deepnote.com" style="display:inline;max-height:16px;margin:0px;margin-right:7.5px;" file="data:image/svg+xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPHN2ZyB3aWR0aD0iODBweCIgaGVpZ2h0PSI4MHB4IiB2aWV3Qm94PSIwIDAgODAgODAiIHZlcnNpb249IjEuMSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIiB4bWxuczp4bGluaz0iaHR0cDovL3d3dy53My5vcmcvMTk5OS94bGluayI+CiAgICA8IS0tIEdlbmVyYXRvcjogU2tldGNoIDU0LjEgKDc2NDkwKSAtIGh0dHBzOi8vc2tldGNoYXBwLmNvbSAtLT4KICAgIDx0aXRsZT5Hcm91cCAzPC90aXRsZT4KICAgIDxkZXNjPkNyZWF0ZWQgd2l0aCBTa2V0Y2guPC9kZXNjPgogICAgPGcgaWQ9IkxhbmRpbmciIHN0cm9rZT0ibm9uZSIgc3Ryb2tlLXdpZHRoPSIxIiBmaWxsPSJub25lIiBmaWxsLXJ1bGU9ImV2ZW5vZGQiPgogICAgICAgIDxnIGlkPSJBcnRib2FyZCIgdHJhbnNmb3JtPSJ0cmFuc2xhdGUoLTEyMzUuMDAwMDAwLCAtNzkuMDAwMDAwKSI+CiAgICAgICAgICAgIDxnIGlkPSJHcm91cC0zIiB0cmFuc2Zvcm09InRyYW5zbGF0ZSgxMjM1LjAwMDAwMCwgNzkuMDAwMDAwKSI+CiAgICAgICAgICAgICAgICA8cG9seWdvbiBpZD0iUGF0aC0yMCIgZmlsbD0iIzAyNjVCNCIgcG9pbnRzPSIyLjM3NjIzNzYyIDgwIDM4LjA0NzY2NjcgODAgNTcuODIxNzgyMiA3My44MDU3NTkyIDU3LjgyMTc4MjIgMzIuNzU5MjczOSAzOS4xNDAyMjc4IDMxLjY4MzE2ODMiPjwvcG9seWdvbj4KICAgICAgICAgICAgICAgIDxwYXRoIGQ9Ik0zNS4wMDc3MTgsODAgQzQyLjkwNjIwMDcsNzYuNDU0OTM1OCA0Ny41NjQ5MTY3LDcxLjU0MjI2NzEgNDguOTgzODY2LDY1LjI2MTk5MzkgQzUxLjExMjI4OTksNTUuODQxNTg0MiA0MS42NzcxNzk1LDQ5LjIxMjIyODQgMjUuNjIzOTg0Niw0OS4yMTIyMjg0IEMyNS40ODQ5Mjg5LDQ5LjEyNjg0NDggMjkuODI2MTI5Niw0My4yODM4MjQ4IDM4LjY0NzU4NjksMzEuNjgzMTY4MyBMNzIuODcxMjg3MSwzMi41NTQ0MjUgTDY1LjI4MDk3Myw2Ny42NzYzNDIxIEw1MS4xMTIyODk5LDc3LjM3NjE0NCBMMzUuMDA3NzE4LDgwIFoiIGlkPSJQYXRoLTIyIiBmaWxsPSIjMDAyODY4Ij48L3BhdGg+CiAgICAgICAgICAgICAgICA8cGF0aCBkPSJNMCwzNy43MzA0NDA1IEwyNy4xMTQ1MzcsMC4yNTcxMTE0MzYgQzYyLjM3MTUxMjMsLTEuOTkwNzE3MDEgODAsMTAuNTAwMzkyNyA4MCwzNy43MzA0NDA1IEM4MCw2NC45NjA0ODgyIDY0Ljc3NjUwMzgsNzkuMDUwMzQxNCAzNC4zMjk1MTEzLDgwIEM0Ny4wNTUzNDg5LDc3LjU2NzA4MDggNTMuNDE4MjY3Nyw3MC4zMTM2MTAzIDUzLjQxODI2NzcsNTguMjM5NTg4NSBDNTMuNDE4MjY3Nyw0MC4xMjg1NTU3IDM2LjMwMzk1NDQsMzcuNzMwNDQwNSAyNS4yMjc0MTcsMzcuNzMwNDQwNSBDMTcuODQzMDU4NiwzNy43MzA0NDA1IDkuNDMzOTE5NjYsMzcuNzMwNDQwNSAwLDM3LjczMDQ0MDUgWiIgaWQ9IlBhdGgtMTkiIGZpbGw9IiMzNzkzRUYiPjwvcGF0aD4KICAgICAgICAgICAgPC9nPgogICAgICAgIDwvZz4KICAgIDwvZz4KPC9zdmc+" %} Created in Deepnote